# PSBench

**PSBench** is a benchmark suite for developing and training methods for estimating (predicting) the accuracy of protein complex structural models.

## Included Datasets

PSBench includes four main datasets spanning two CASP competitions:

### 1. CASP15\_inhouse\_dataset
- **Source**: MULTICOM3 models from CASP15 (2022)
- **Total Models**: 7,885
- **Includes**: Summary of the dataset, Fasta sequences, AlphaFold features, predicted models, and quality scores

### 2. CASP15\_community\_dataset
- **Source**: All group submissions from CASP15
- **Total Models**: 10,942
- **Includes**: Summary of the dataset, Fasta sequences, predicted models, and quality scores

### 3. CASP16\_inhouse\_dataset
- **Source**: MULTICOM4 models from CASP16 (2024)
- **Total Models**: 1,009,050
- **Includes**: Summary of the dataset, Fasta sequences, AlphaFold features, predicted models, and quality scores

### 4. CASP16\_community\_dataset
- **Source**: All group submissions from CASP16
- **Total Models**: 12,904
- **Includes**: Summary of the dataset, Fasta sequences, predicted models, and quality scores

### 5. Multimer\_7\_2024\_8\_2025\_dataset
- **Source**: AlphaFold3-predicted multimeric protein structures for non-redundant RCSB Protein Data Bank (PDB) depositions between July 2024 and August 2025
- **Total Models**: 400,400
- **Includes**: Summary of the dataset, Fasta sequences, AlphaFold features, predicted models, and quality scores

In addition, CASP15\_inhouse\_TOP5\_dataset (a subset of CASP15\_inhouse\_dataset) and CASP16\_inhouse\_TOP5\_dataset (a subset of CASP16\_inhouse\_dataset) are also included into the PSBench. They were used to train and test GATE (a graph transformer EMA method). These datsets exclude Predicted\_Models/ subdirectory to reduce redundancy and conserve space.

## Quality scores (labels)
For each structural model in the datasets, we provide 10 unique quality scores as labels:


- **Global Quality Scores** : tmscore (4 variants), rmsd 
- **Local Quality Scores** : lddt 
- **Interface Quality Scores** : ics, ics\_precision, ics\_recall, ips, qs\_global, qs\_best, dockq\_wave

## Additional features

For the CASP15\_inhouse\_dataset and CASP16\_inhouse\_dataset, along with their corresponding subsets (CASP15\_inhouse\_TOP5\_dataset and CASP16\_inhouse\_TOP5\_dataset), as well as the Multimer\_7\_2024\_8\_2025_dataset, the following additional features are provided for each model:


- **model\_type** : Indicates model type (AlphaFold-multimer or AlphaFold3-based)     
- **afm\_confidence\_score** : AlphaFold-multimer confidence score                               
- **af3\_ranking\_score**   : AlphaFold3 ranking score                                          
- **iptm** : Interface predicted Template Modeling score                       
- **num\_inter\_pae** : Number of inter-chain predicted aligned errors (<5 Å)             
- **mpDockQ/pDockQ** : Predicted multimer DockQ score      


## Instructions for Using the Datasets

**1. Extract the main archive**  
   Decompress each `.tar.gz` file.  
   - **Linux/macOS**:  
     ```bash
     tar -xzf CASP16_inhouse_dataset.tar.gz
     ```  
   - **Windows**: Use a tool like [7-Zip](https://www.7-zip.org/).

**2. Extract individual target archives**  
   Within each dataset, navigate to the `Predicted_Models/` directory. This directory contains compressed files for each target (e.g., `H1202.tar.gz`). These archives should also be extracted to access the predicted models.


## The dataset directory structure

After the datasets are downloaded from Harvard Dataverse and uncompressed, the structure of the datasets should be:

```text
📁 PSBench/
├── 📁 CASP15_inhouse_dataset/
│   ├── 📄 CASP15_inhouse_dataset_summary.tab
│   ├── 📁 AlphaFold_Features/
│   ├── 📁 Fasta/
│   ├── 📁 Predicted_Models/
│   └── 📁 Quality_Scores/
├── 📁 CASP15_inhouse_TOP5_dataset/
│   ├── 📄 CASP15_inhouse_TOP5_dataset_summary.tab
│   ├── 📁 AlphaFold_Features/
│   ├── 📁 Fasta/
│   └── 📁 Quality_Scores/
├── 📁 CASP15_community_dataset/
│   ├── 📄 CASP15_community_dataset_summary.tab
│   ├── 📁 Fasta/
│   ├── 📁 Predicted_Models/
│   └── 📁 Quality_Scores/
├── 📁 CASP16_inhouse_dataset/
├── 📁 CASP16_inhouse_TOP5_dataset/
├── 📁 CASP16_community_dataset/
├── 📁 Multimer_7_2024_8_2025_dataset/
└── 📄 README.md

```

## Reference

Neupane, P., Liu, J., & Cheng, J. PSBench: a large-scale benchmark for estimating the accuracy of protein complex structural models. The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS), 2025

